Uninitialized Fingerprints in binary_fuse filters. #56

SirTyson · 2024-07-11T05:54:27Z

In both versions of binary_fuse*_allocate, filter->Fingerprints is allocated, but uninitialized. During binary_fuse*_populate, this uninitialized valued is used when setting the filter:

filter->Fingerprints[h012[found]] = xor2 ^
                                        filter->Fingerprints[h012[found + 1]] ^
                                        filter->Fingerprints[h012[found + 2]];

This change 0 initializes filter->Fingerprints. I believe this is correct, as in "Binary Fuse Filters: Fast and Smaller Than Xor Filters," H is defined as a 0 initialized array in Algorithm 1.

thomasmueller · 2024-07-11T06:09:11Z

I think you are right. The fingerprints are not set to zero (except when de-serializing from another source). This doesn't affect functionality, false positive rate etc, because it doesn't matter what data is stored initially in this array. But, I would also expect the array to be initialized to zero. This could be done the allocate method (as you have done), or later in the populate method. If done in allocate, then it will be overwritten in deserialize, which I think is not a problem.

So, I'm OK with this patch.

lemire · 2024-07-11T12:09:40Z

@SirTyson I chose malloc deliberately but I don't object to this PR. I expect that the overhead of calloc over malloc is small.

SirTyson · 2024-07-12T18:56:23Z

While I agree that this does not affect false positive rate or correctness, I still think it is a helpful change. In particular, I'm using this library in a blockchain application, where the filter will be part of the ledger and must be bit consistent when constructed on separate nodes. Currently, even with the same initial seed, two nodes will produce different filters at the bit level. While the overall false positive rate is unchanged, I think (but am not an expert on the topic and could be wrong) that two filters constructed with the same initial seed will hit false positives on different keys, which is also not acceptable in a blockchain application.

Fixed unitialized Fingerprints bug

0ed5f0d

lemire merged commit 3c0fd15 into FastFilter:master Jul 12, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uninitialized Fingerprints in binary_fuse filters. #56

Uninitialized Fingerprints in binary_fuse filters. #56

SirTyson commented Jul 11, 2024

thomasmueller commented Jul 11, 2024

lemire commented Jul 11, 2024

SirTyson commented Jul 12, 2024

Uninitialized Fingerprints in binary_fuse filters. #56

Uninitialized Fingerprints in binary_fuse filters. #56

Conversation

SirTyson commented Jul 11, 2024

thomasmueller commented Jul 11, 2024

lemire commented Jul 11, 2024

SirTyson commented Jul 12, 2024